fix: validate connection health and handle missing slots in cluster refresh#944
Open
billshen99 wants to merge 1 commit intoredis:mainfrom
Open
fix: validate connection health and handle missing slots in cluster refresh#944billshen99 wants to merge 1 commit intoredis:mainfrom
billshen99 wants to merge 1 commit intoredis:mainfrom
Conversation
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
rueian
reviewed
Jan 30, 2026
| conns[addr] = fresh | ||
| // Validate connection health before reusing | ||
| // If connection has error, it will be replaced with a new one | ||
| if cc.conn.Error() == nil { |
Collaborator
There was a problem hiding this comment.
mux handles reconnections internally, so no Error() check at the cluster client is okay.
Comment on lines
+332
to
+334
| // Preserve old slot mappings for any slots not covered by the new topology | ||
| // This prevents nil connections when CLUSTER SLOTS returns incomplete data | ||
| for i := 0; i < 16384; i++ { |
Collaborator
There was a problem hiding this comment.
We should not do this. CLUSTER SLOTS should be the only truth.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes critical bug where broken connections were blindly reused during cluster
topology refresh, causing persistent "wrong shard/slot" errors requiring pod restarts.
Root Cause
During
_refresh(), connections were reused without health validation. Networkissues could break connections, which would persist in error state indefinitely
across refresh cycles.
Changes
Validate connection health with
Error()before reusingAdd replica-to-master fallback for missing replica slots
Impact
This pull request improves the reliability of the Redis cluster client by ensuring that unhealthy (broken) connections are not reused after a topology refresh, and by making slot mapping more robust when the cluster topology is incomplete. It also adds comprehensive tests to validate these behaviors.
Testing enhancements: